DOCUMENT RESUME 



BD 300 381 



TM 012 169 



TITLE 



INSTITUTION 

PUB DATE 
NOTE 
PUB TYPE 



Illinois Initiatives for Education Reform. Test 
Preparation Prograun for Gifted and Talented 
Sophomores: 1986 Summer Program, Evaluation 
Report. 

Chicago Board of Education, 111. Dept. of Research 
and Evaluation. 
87 

38p. ; This document is printed on blue paper. 
Reports - Evaluative/Feasibility (142) 



EDRS PRICE MF01/PCO2 Plus Postage. 

DESCRIPTORS Academically Gifted; ^Achievement Gains; College 

Entrance Examinations; Grade 10; High Schools; *High 
School Students; *Progrcun Evaluation; Scores; Student 
Attitudes; *Summer Programs? Tale'^t; Teacher 
Attitudes; *Test Coaching; Test Wisoness 

IDENTIFIERS *National Merit Scholarship Qualifying Test; 

*Preliminary Scholastic Aptitude Test 



ABSTRACT 

The fourth year of the ?est Preparation Program for 
Gifted and Talented Sophomores (TPPGTS) is evaluated. This 6-wee)c, 
75-hour test coaching program was developed to teach high-achieving 
students principles/ -'trategies required for doing well on the 
Preliminary Scholastic Aptitude Test (PS AT) /National Merit 
Scholarship Qualifying Test to increase the number of National Merit 
semifinalists and finalists. The TPPGTS emphasized language arts, 
mathematics, and guidance, and was conducted at the Lane, Lindblom, 
and Curie High Schools in Chicago, Illinois. Pretests and posttests 
were completed by 93 of the 148 enrollees; 72 of these took the 
October PSAT. The comparison group initially included 137; of these, 
47 completed practice tests and 37 took the PSAT. There were nine 
teachers and three teacher aides in the TPPGTS. Student and teacher 
questionnaires and classroom observations were analyzed concerning 
the degree to which summer progrcun student* outperformed the 
comparison group frciTi pre- to posttest and whether this effect was 
attributable to the progrcun. The TPPGTS appeared to be imp]emented as 
designed; an overall positive math program effect equivalent to 42 
SAT points was found. Neither general nor site-specific verbal 
coaching effects were seen. Students were generally satisfied with 
the TPPGTS and thought they had learned effective strategies. 
Teachers noted the diversity of instructional materials and 
activities, and made recommendations for improving the progreun. 
(SLD) 



********************************************************************* 

* Reproductions supplied by EDRS are the best that can be made 

* from the original document. 




omores 



I9S6 Summer Piogiam 
EVALUATION REPORT 



at. DCMRfTMCNT OP EDUCATION 
Offc* of Educationai RMMrch and Improvement 

EDIKATIONAL RESOURCES INFORMATION 
y CENTER (ERIC) 

Gn'hia document hee been reproduced 
received from the person or organtzr* on 
OfiQtnattng it 

O Minor chsr>gee have been made to improve 
reproduction quklrty 

• Pointa of view or opininns stated m this Jocu* 
m«nt do f>ot nec«saarity represent of** ^< 
OERl position or policy 



"PERMISSION TO REPRODUCE THIS 
MATERIAL HAS BERN GRANTED BY 



TO THE EDUCATIONAL RESOURCES 
INFORMATION CENTER (ERIC) " 



Illinois Initiatives for 
Education Reform 





Chicago Public Schools 

Manford Byrd. Jr. 
QeacKii Supeintendent ofSchooh 



ERJCI 



BEST COPY AVAILABLE 



DEPARTMENT OF RESEARCH AND EVALUATION 
BUREAU OF PROGRAM EVALUATION 



ERIC 



ILLINOIS INITIATIVES FOR EDUCATION REFORM 

TEST PREPARATION PROGRAM FOR GIFTED 
AND TALENTED SOPHOMORES 

1986 Supmer Program 
Evaluation Report 



Chicago Public Schools 
Manford Byrd, Jr. 
General Superintendent of Schools 



It is the policy of the Board of Education of the 
City of Chicago not to discriminate on the basis of 
race, color, creed, national origin, religion, age, 
handicaps unrelated to ability, or sex in its edu- 
cational program or employment policies or practices* 



BOARD OF EDUCATION 
CITY OF CHICAGO 



Frank W. Gardner, President 
William M. Farrow, Vice-President 

Clark Burrus 
Ms. Linda Coronado 
Ms. Frances Davis 
Ms. Mattie Hopkins 
Ms. Ada N. Lopez 
Georae Munoz 
Mrs. Patricia O'Hern 

Michael Penn 
Mrs. Winnie Slusser 



Chicago Public Schools 

Manford Byrd, Jr. 
General Superintendent of Schools 

Howard Denton 
Assistant to the General Superintendent 

Carole Perlman 
Acting Administrator 
Department of Research and Evaluation 



Acknowledgments 



The program evaluated in this report was funded under the Illinois 
Initiatives for Education Reform. Program development and implementation was 
jointly administered by the Office of Programs for Gifted and Talented 
Students and the Bureau of Guidance Programs and Services. 

The evaluation was designed and conducted by the Bureau of Program 
Evaluation, Department of Research and Evaluation, and coordinated by 
Geraldine L. Oberman. The evaluation specialist was Arthur Reynolds. Staff 
from the Department of Research and Evaluation provided clerical, technical, 
and supportive assistance. 



ERIC 



Table of Contents 



Page 



Executive Summary vii 

Introduction 1 

Research Perspective 1 

Evaluation Questions 3 

Evcil»;ation Design 3 

Student Information 3 

Teacher Selection 6 

Program Description , 7 

Instruments 7 

Procedure 8 

Results 9 

Program Implementation 9 

Test Results 10 

PSAT Scores by Stanine Group 13 

Analysis within the Summer Program Group 15 

Supplemental Reports of Program Effects 15 

October 1986 Results 18 

Discussion 19 

Establishing the Validity of Observed Program Effects 20 

Threats to Internal Validity Ruled Out 20 

Threats Not Ruled Out as Explanations for Program Effects 22 

The Status of the Verbal Coaching Component 23 

Threats to External Validity 24 

The Status of the Program 25 

Comparability with Other Studies 25 

Summary 26 

Recommendations 27 

References 29 

Tables 

1. Distribution of TAP Stanine Scores by Summer 

Program and Comparison Groups 5 

Z. Self-Reported High School English and Math Records by Group . 6 

3. Mean Practice PSAT Pre- and Posttest Scores by Group and Site 11 



4. Mean Practice PSAT Scores by TAP Reading and Math Stanine Groups ... 14 

5. Mean PSAT Scores of Students taking the October 1986 Test by Group ..19 



V 

7 



Executive Summary 



This report examines the fourth year (1986) of the Test Preparation 
program for Gifted and Talented Sophomores (TPPGTS). The TPPGTS is a six- 
week, 75-hour coaching program developed to teacli high achieving students test 
principles and strategies necessary to do well on the Preliminary Scholastic 
Aptitude Test/National Merit Scholarship Qualifying Test (PSAT/NMSQT) . The 
test is administered every year in October. The central goal of the program 
is to produce more National Merit semi-finalists and finalists than would be 
expected without such a program. The program was conducted at three sites, 
Lane, Lindblom, and Curie High Schools, and ran from June 27, 1986, to July 
31, 1986. Students were recruited citywide. Ninety-three (93) out of 148 
students (63X) completed practice pre- and posttests during the program while 
72 summer program students took the October 1986 PSAT. One-hundred thirty- 
seven comparison group students were initially identified and 47 (34X) 
completed practice pre- and posttests. Of this group, 37 took the October 
1986 PSAT. 

The major evaluation question addressed was the degree to which summer 
program students outperformed the comparison group from pre- to-posttest and 
whether this effect was attributable to the program. Also of interest was the 
effectiveness of program implementation and student and teacher perceptions of 
the program. The primary results of the evaluation are as follows: 

A. The program appeared to be implemented as designed and students were 
on-task during observation periods. A variety of instruction materials 
and activities were employed, although the diversity of instructional 
approaches may have compromised program uniformity across sites. 

B Results of pre- and posttest PSAT scores indicated a substantial math 
program effect at Lane as summer program students outgained their 
comparison group by 7.1 points. Results were more modest at Lindblon 
(summer program gain of 3.3 points) and Curie (1.4 points). An overall 
positive math program effect of 4.2 points (42 SAT points) was found. 
However, general or site-specific verbal coaching effects were not 
found as comparison group students slightly outperformed sumner program 
students (1.9 to 1.5 point gains). 

C. Stanine group analysis did not support higher PSAT gain scores for 
higher stanine groups. Actual score differences between stanine groups 
indicated that average seventh and eighth stanine students are least 
likely to become National Merit semi-finalists. 

D. Students were generally satisfied with TPPGTS as they indicated they 
became better prepared to take the PSAT, thought the materials were 
effective, and learned a variety of test-taking strategies. However, 
despite their training, one-third of the program students indicated 
they would not usually guess if they didn't know an answer to a 
problem. This is a large percentage given the emphasis that in placed 
on making educated guesses. 

E. Teachers noted the diversity of instruction materials and activities 
used in the program. Primary activities included discussion, inde- 
pendent seatwork, oral recitation, and demonstration. The teachers 
also made program recommendations including improving the criteria of 
selection, changing the program to after-school, and providing more 
teacher in-service training sessions. 



vii 

8 



Thus, the major outcome of this study was the substantial impact of the 
math coaching program, especially at Lane. This effect was traced to the 
systematic use of former PSATs during the program and was critically analyzed 
for internal and external validity. These results are more positive than 
other math coaching studies. The verbal coaching program results were a 
disappointment but are not supported by results of the 1985 and 1984 programs. 
An emphasis on practice and drill with former PSATs is suggested. Future 
progress should also mandate uniform program implementation to duplicate 
program effects across sites. The following recommendations are also made: 

1. Emphasize practice and drill with past PSATs. 

2. Train teachers about the most effective ways to coach the test. 

3. Standardize the instruction and materials. 

4. Improve the student selection process. 

B. Refine the identification and selection of gifted students to be 
served by this program. 

6. Change the program to a general SAT/ACT preparation program. 

7. Give students an incentive for participating in the program. 

8. Since coaching programs produce short-term gains and do not improve 
cognitive skills necessary for doing well in college, emphasis 
should be given to pr 'ms that develop long-term cognitive 
skills. 



viii 



Introduct ion 



This report examines the fourth year (1986) of the Test Preparation Pro- 
gram for Gifted and Talented Sophomores (TPPGTS). The TPPGTS is a six-week 
intensive coaching program developed to teach high achieving students test 
principles and strategies necessary to do well on the Preliminary Scholastic 
Aptitude Test/National Merit Scholarship Qualifying T^»st (PSAT/HMSQT) . The 
test is administered every year in October and is composed of verbal and 
quantitative items deemed relevant for college preparation. The central goal 
of the program is to produce more National Merit semi-finalists and finalists 
than would be expected without such a coaching program. The program ran from 
June 27, 1986, to July 31, 1986, and was approved and authorized by the Board 
of Education on April 30, 1986. The program recruited students citywide and 
was located at three sites: Lane, Lindblom, and Curie high schools. 

Research Perspective 

Test coaching has had a short but controversial history in education. 
During the 1950's, early studies of the effects of practice and coaching 
on test performance were conducted in Great Britian and dealt with tests used 
to assign children to different secondary schools (Anaitasi, 1981). It was 
typically found that the degree of improvement depended on ability, edu- 
cational experience, type of coaching, and the underlying characteristics of 
the test. Coaching programs for the Scholastic Aptitude Test (SAT) were also 
being initiated at this time. Early studies showed promising results despite 
pronouncements by the Educational Testing Service (ETS) and College Entrance 
Examination Board (CEEB) that such coaching was ineffective. 

Since these early developments there has been extensive research on the 
effects of coaching on test performance, especially regarding the SAT. Dyer 
(1953), of ETS, conducted one of the first SAT coaching studies and found that 
program students made significant mean gains over the control group of 12.9 
points on the math subtest and 4.6 points on the verbal subtest. The 10-hour 
program was composed of drill and vocabulary development on items developed by 
ETS to be compatible with the SAT. Other studies of relatively short coaching 
programs conducted by ETS (Dear, 1958; French, 1955) showed even higher 
significant gain scores for SAT-coached students. 

To synthesize the plethora of SAT and PSAT coaching studies, two exten- 
sive literature reviews (Messick t- Jungebliit, 1981; Slack & Porter, 1980) and 
a meta-analysis (DerSimonian & Laird, 1983) have been completed. Slack & 
Porter (1980) examined the results of 19 published SAT coaching studies, 
including those mentioned above, and found average gain scores of 29 and 33 
points on the verbal and math subtests, respectively, although many of these 
studies did not include control groups. 

Messick & Jungeblut (1981) confirmed the significant positive effects 
of SAT coaching studies but indicated that observed gains were reduced for 
controlled or randomized designs. They also found that program effects 
increased with student contact time. Using a logarithmic function analysis, 



ERIC 



10 



they computed average score effects as related to the number of contact hours 
for the 17 verbal and 14 math studies. From this model it was estimated that 
a verbal gain of 10 points would require 12 hours of instruction and a vtrbal 
gain of 20 points would require 57 hours. For the math subtest, 20 score 
points would require 19 hours of instruction, 30 score points would require 45 
hours of instruction, and 40 points would require 107 hours. DerSimonian & 
Laird (1983), in a meta-analysis of published SAT test preparation programs, 
found that program effects differed by the type of evaluation model employed. 
Using many of the same studies as the above reviews, they found average 
program effects to be approximately 10 points on the verbal and math subtests 
for matched and randomized studies. 

Thus, despite early pronouncements by ETS that special coaching can do 
little to improve SAT scores, research has consistently shown that 
statistically significant gains due to coaching can be expected from typical 
test preparation programs and gains generally increase with student contact 
hours. Based on the above studies, estimated average SAT coaching gains 
appear to be 10-20 points on the verbal and math subtest, although the verbal 
subtest is less amenable to coaching than the math subtest. However, it 
should be remembered that the studies cited varied considerably in the kind of 
instruction/materials used and may not represent the best available means to 
producing maximum coaching gains. For example, the recent availability of 
past SAT forms provides more advantageous means for practice and drill on SAT 
items. The few randouized and/or highly controlled studies actually completed 
and their short duration further limit the above results. It is imperative 
that studies be undertaken with more diverse populations and instructional 
approaches before the magnitude of coaching effects is confirmed. Most 
reported studies include graduating seniors in private high schools and a^e 
not generalizable to all student groups such as gifted students, for example. 

Results of the past two years (1984-1985) of TPPGTS indicated average 
gains of 5.6 and 3.8 points on the verbal and mathematics subtests, equivalent 
to 56 and 38 points on the SAT scale (200-800 range) (Chicago Public Schools, 
1986). While these results are much greater than results of the above meta- 
analyses, they are tempered in two ways: (1) the evaluations did not employ 
adequate control groups dasigns, thus probably overestimating program effects, 
and (2) the length of the programs (75 hours) and the type of students 
enrolled (gifted) would be expected to produce higher gain scores than typical 
test preparation programs. 

The present evaluation of the fourth year of the TPPGTS takes into 
account the major limitations f previous yeu/s by employing a pretested 
control group design to determine program effects. Program and comparison 
group students were tested at the beginning and end of the surmier program with 
past forms of the PSAT. Classroom observations and questionnaires were also 
employed to assess the implementation of, and participant satisfaction with, 
the TPPGTS. In addition, results of the October 1986 PSAT were obtained and 
subsequently compared across groups for stability. The use of October 1986 
PSAT scores i,i this wa;' differs from previous evaluations, in which program 
effects were related to the October test. With the addition of comparison 
group pretests, it is no longer advantageous to compare October test results 
for determination of program effects. 



ERIC 



-2- 
11 



Evaluation Questlot.s 



The following evaluation questions were addressed in this study to 
measure program effectiveness: 

A. Was the program implemented as designed? Were the instruction 
materials compatible with this design? 

B. D'j students participating in the summer program show greater 
improvement than comparison group students from pre- to posttest and 
was this improvement stable up to the October 1986 PSAT? 

C. What was the students' assessment of the program and its materials? 
What did they learn from the program? 

D. What were the perceptions of the teachers in the program regarding 
its content? What perceptions did teachers attribute to students? 

These questions will be of particular interest to groups who have a 
vested interest in this program including (1) students interested in preparing 
for standardized tests, (2) teachers devoted to test preparation instruction 
and facilitating testwiseness among their students, (3) counselors who are 
most interested in assisting students with academic and career plans, (4) 
school administrators in general who are crucial in communicating the 
relevance of test preparation to students, and (5) the educational and 
scientific community at large who can help facilitate and develop the 
evaluation of test preparation programs. 



Evaluation Design 

In orc'er to test the central evaluation question, whether summer program 
students show greater growth on pre- and posttest PSAT tests than comparison 
group students, an untreated control group design was used (Cook & Campbell, 
1979). Thi generally interpretable nonequivalent control group design tests 
criterion score differences between groups before and after the program. 
Determination of program effects is estimated to be the difference between 
mean criterion score growth of experimental and comparison groups, that is, 
the mean score differences from pre- to posttest between summer program and 
comparison group students on PSAT verbal and mathematics subtests. To 
determine the stability of program gain scores, results of the October 1986 
PSAT were obtained and analyzed for fitness. While this evaluation design is 
generally interpretable, it is less desirable than a control group design with 
controlled selection whereby students are randomly assigned to treatment 
conditions. Thus, selection differences between groups can bias results. The 
ethical question of whether to exclude students from participation in a 
program with documented positive effects precluded employment of this more 
powerful design. 

Student Information 

One hundred forty-eight (148) students enrolled in TPPGTS at three sites 
and were obtained from over 1400 students eligible to participate. 
Eligibility, as defined by program administrators, was restricted to students 



ERIC 



-3- 



12 



who scored at or between the seventh and ninth stanlnes In reading and math on 
the Fall 1985 Tests of Achievement and Proficiency (TAP). F:fty-seven (57) 
percent of program students initially attended the program at Lane while 19 
28) and 24% (35) attended at Lindblom and Curie, respectively. Sixty-three 
63} percent or 93 of the participating students completed the program and 
pr -posttests. 

Demngraph^'* information obtained from students completing both pre- and 
posttests incii u.ed that 51% were femaV;, 49% were male, and 79% will be in 
their third year of high school as of September 1986. Twenty-one (21) percent 
will be sophomore high school students. Nearly all program students (94%) 
plan to take the October 1986 PSAT and 44% took the PSAT prior to the program. 
Eleven (11> percent of the students have been involved in other test 
preparation profjrams. In addition, 22 schools were represented in the program 
as 30% of enrolled students regularly attended Lane; 11% attended Curie; 10% 
attended Young, and 10% attended Lincoln Park. Kenwood and W^n Steuben 
students each accounted for 9% of program students. See supplemental report 
documentation for a more detailed account of evaluation Instrument responses. 

Regarding the nature of their enrollment, 76% of the students reported 
registering for the urogram because they wanted to learn to do better on 
tests. Sixty-four (64) percent of ti students indicated they enrolled 
because they thought the program would be interesting. Other reasons cited by 
student** for participating in the program were they had nothing else planned 
for the summer (32%), their ps^rents expected th^m tc (30%), and they r eded 
the English review {29%). It should also be noted that many eligible students 
were not sent letters of participation because of errors in mailing. Whi'^e 
some of these students were finally contacted through notices sent to 
potential comparison group students, it is likely these letters did not 
compensate for those not sent. 

The 137 comparison group students were also obtained from the pool of 
eligible students through letters sent independently of the summer program. 
However, these letters were restricted to those eligible students who attended 
Lane, Lindblom, Curie, Young, and Kenwood. Comparison group students, who may 
not have had time to participate in the program, responded to letters of 
invitation to take two practice PSAT's before and after the summer program at 
Lane and Curie in preparation for the October 1986 PSAT. Thirty-four percent 
(4/) of the students tested comoleted pre- and posttests. As with the summer 
program group, a majority of students responding to an information sheet were 
female (53%) and were In their third year of high school (82%) as of September 
1986. In addition, all students indicated they were planning to take the 
October 1986 PSAT while 22% of responding students had taken the PSAT before. 
Twenty (26) percent of Che comparison group students also indicated they will 
study for the October 1986 test. Further, over one-half of the comparison 
group students (65%) regularly attended Lane and 26% attended Young. Seven 
and two percent of participating st *3nts attended Kenwood and Lindblom, 
respectively. 

Also notable were the major reasons comparison group students did not 
enroll in the summer program. These included vacation, illness, or 0 summer 
Job (542 of students), willingness to study for the PSAT on their own (26%), 
summer courses conflicted with the program (26%), and letters informing them 



-4- 

13 



about the program arrived too late (15%). An additional 24% of the students 
Indicated other reasons for not erirolMng In the program, most frequent of 
which was they never received notification of the program. This finding is 
reflective of the fact that letters were not sent to all eligible students. 



TABLE 1 

Distribution of TAP Stanine Scores 
by Summer Program and Comparison Groups 
(Sumi)er Program N'«93, Comparison Group N*47) 

















Sumner 


Comparison j 




Sumer 


Comparison 


stanine 


Proflram 


GrouD 1 


Stanine 


Proqram 


Group 


7 


33X 


34X 1 


7 


40% 


34X 


8 


40X 


32X 1 


8 


36% 


43X 


9 


27« 


34X 1 


9 


24% 


23X 


Total 


lOOX 


lOOX 1 


Total 


lOOX 


lOOX 


Mean (X) 


7.9 


8.0 1 


Mean (X) 


7.8 


7.9 



Table 1 displays fall 1985 TAP stanine scores by summer program and 
comparison group students who completed both pre- and posttests. As is 
shown, mean reading (7.9 and 8.0) and math (7.8 and 7. ) stanine scores are 
approximately equal between summer program and comparison group stud^ its, 
respectively. Thus, the academic achievement of both groups on the^j measures 
is equivalent. The distribution of TAP stanine scores between groups is also 
similar. On the reading subtest, comparison group students were evenly 
divided among stanine groups, while stanine eight summer program students had 
a slightly higher proportion of students (40%) than the comparison group. On 
the math subtest, differences between groups occurred at the seventh and 
eighth stanines. The highest proportion of summer program students was at the 
seventh stanine (40%), the reverse of the comparison group, in which 43% were 
at the eighth stanine. It should also be noted that TAP subtest and total 
test scaled scores were also compared between groups. Results were similar to 
the above stanine data as no significant differences existed. 



-5- 

ERLC 14 



Table 2 



Self-Reported High School English and Math Records by Group 
(Summer Program H^Ql, Comparison Group Ns46) 





ENGLISH COURSES 








MATH COURSES 




i r 1 A e e 1 
1 L 1 ttSS 1 


Summer 
% 

Enrolled 


Prograu Comoar 
Mean X 
Grade Enrolled 


Mean 
brade 


1 Summer 
1 X 
1 Class Enrolled 


Program ComDarison 1 
Mean X Mean 1 
Grade Enrolled Grade 




96 


3.19 1 100 


C • 79 


isth grade 
1 algebra 


46 


3.19 1 


42 


3.13 1 


lEngllsh 2 


9A 


3.04 1 100 


2.95 


lAlgebra I 


97 


3.20 1 


98 


3.10 1 


(English 3 


11 


2.50 1 11 


3.00 


[Geometry 


92 


3.01 1 


98 


2.98 1 










lAlgebra II 


28 


3.35 1 


22 


2.67 i 


IMean # 
1 Courses 








[Trigonometry 


32 


2.96 1 


24 


3.70 1 


1 Enrolled/ 
1 Grade 


2.1 


3.10 1 2.1 


2.96 


IMean # Courses 
lEnrolled/Grade 


3.0 


,3,Q9J^ 




2.94 1 



Further academic Information between groups is reported in Table 2. As 
displayed, the proportion of summer program and comparison students having 
taken basic English and math courses is nearly equivalent across all courses. 
Over 90X of both groups have taken English 1, English 2, Algebra 1, and 
Geometry. In addition, self-reported math and English grades between groups 
were similar as average course grades indicate. The only noticeable 
difference between groups occurred in Algebra II where the mean grade of 
program students (3.35 out of 4.00) was higher than comparison group students 
(2.67). However, the small sample size of the comparison group and 
instructional differences between classes may account for this difference. 
Thus, the similarity of high school courses taken and grades received for the 
groups strongly Indicate the general academic equivalence of both groups. 

Teacher Selection 

Nine teachers were assigned to the PSAT preparation program, three at 
each of the three sites. Teachers at each site specialized in the content 
areas of language arts, matheratics, or guidance. All teachers were selected 
based on recommendations forwarded to program administration, experience in 
test preparation, and general Interest. Although teachers were experienced, 
only one had been involved in prior test preparation programs. Consequently, 
they were not extremely knowledgeable about coaching the PSAT. One teacher 
aide was also provided at each site. 



-6- 

15 



Program Description 



The 75-hour summer program included approximately 60 hours of test 
preparation and was divided in three content areas: (1) language arts, (2) 
mathematics, and (3) guidance. It was organized around the content of the 
actual PSAT with special emphasis on test-taking strategies, testwiseness, and 
drill. Three main textbcoks were used: (1) Barron's PSAT/NMQST 6uide > (2) 
Strategies for Taking SAT fest. and (3) Gruber's Inside Strategies for the 
SM* At each site the program was divided into three groups of three classes 
on the basis of math PSAT pretest scores. Each site subgroup rotated among 
subject areas on a daily basis. Each class period was 50 minutes in length. 

In language arts classes, students examined and reviewed vocabulary, 
sentence completion, verbal analogies, and reading comprehension sections 
of the PSAT. In the mathematics section of the course, basic mathematics, 
algebra, and geometry were emphasized. The guidance area explored awareness 
and understanding of test-taking strategies, values clarification, and career 
choices. Future plans and goals were also discussed in this component. 

With the exception of the primary textbooks, there was no uniform 
curriculum for each component. Concentrating on a particular area, each 
teacher was allowed the flexibility of selecting his/her own manner of 
presentation and instruction. However, practice test-taking, drill, and 
testwiseness strategies were emphasized throughout. 

Instruments 

Student Questionnaire, This 41-item self-report questionnaire assessed 
the content^ instruction materials, and student understanding of testwise-ness 
principles of the program. Administered at the end of the program, survey 
items were coded on a four-point Likert Scale and were composed of items about 
general program assessment (i.e., "I learned test-taking strategies that I did 
not know before"), the effectiveness of materials ("How effective was Inside 
Strategies for the SAT?'')> and understanding specific testwiseness principles 
("In this program I learned that I should usually '^ess when I don't know the 
answer to a problem.") The higher an item was rated the more positive the 
particular dimension was a*»sessed. 

Teacher questionnaire, A 9-item multiple choice and open-ended question- 
naire was completed by program teachers which asked them to document skills 
taught in the program, instructional strategies and evaluation criteria used, 
and problems encountered in implementation. Teachers also assessed the 
effectiveness of the instruction materials used in the program as well as its 
strengths and weaknesses. Recommendations for future test preparation 
programs were also solicited. 

Classroom observations . To assess the implementation of the test pre- 
parparation program, classroom observations were conducted at the three sites 
by two staff members from the Department of Research and Evaluation. Each 
subject area at the sites was visited once at the beginning of the program 
with the exception of one subject, which was observed twice during two 
different class periods. The subject areas of the double observations were 
different at each site so the number of total observations within each subject 
area were equivalent, f'hese descriptive observations documented various 



-7- 



ERIC 



16 



classroom variables Including (1) the compatibility of the classroom 
activities and the program outline, (2) student time-on-task, (3) type and 
length of classroom activities, (4) materials used, and (5) the classroom 
learning climate. At the completion of each obj^ervatlon, brief Interviews 
were completed with teachers that encompassed reactions to the materials, 
student reactions to the content, and the ability of students. 

The PSAT, To determine student academic growth of program students, past 
PSATs were used, specifically forms S and T of th^ Fall 1985 PSAT. It should 
be noted that Lane students took Form 2 of the 198:^ PSAT as the posttest. 
These tests were administered In alternate forms at the beginning and end of 
the program to both summer program and comparison groups. In addition, to 
assess the stability of PSAT scores, pre- and posttest scores were compared to 
results of the October 1986 PSAT. The PSAT Is a standardized achievement test 
given to high school sophomores In preparation for the SAT. According to the 
College Entrance Examination Board (CEEB) (1985), the sponsor and governing 
body of the PSAT and SAT, the PSAT Is "a multiple-choice test that measures 
developed verbal and mathematical reasoning abilities Important for academic 
performance In college. It assesses ability to reason with facts and concepts 
rather than the ability to recall and recite them." Items assess reading 
comprehension, word meaning, sentence completion, basic arithmetic, algebra, 
and geometry. It Is highly correlated with the SAT. There are 115 questions 
on t'a test, 65 In verbal and 50 In the mathematics section. Raw scores are 
corrected for guessing and converted to linearly derived scaled scores ranging 
from 20 to 80 with a mean of 50. Selection index scores are used for merit 
scholar selection and are computed as twice the verbal score plus the math 
score. Testing time Is one hour and 40 minutes. 

Procedure 

Eligible students were sent letters In May 1986 Inviting them to partici- 
pate In the program. Although many eligible students did not receive these 
letters, those who enrolled in the six-week program were assigned to one of 
three sites, depending on which site was closest to their home or regular 
school. Students at each site were divided Into three groups on the basis of 
their pretest scores and subsequently rotated among three teachers, one in 
each of the three content areas of language arts, mathematics, and guidance. 
On June 16, 1986, second letters were sent to eligible students at Lane, 
Lindblom, Curie, Kenwood, and Young Inviting them to take two practice ..ests 
as comparison group students at either Lane or Curie. This was practical for 
many students who did not have time to enroll In the program. All students, 
summer program or comparison group, were af'mlnlstered alternate-form pre- 
and posttest PSATs before and after the program. Additional background 
information was also obtained. During the program, student and teacher 
1, <est1onna1res were completed by participants and classroom observations were 
conducted. October 1986 PSAT scores were also obtained for both groups. 



ERIC 



-8- 



17 



Results 



Program Implementation 

The Implementation of the TPPGTS was assessed by classroom observation 
to ascertain the degree to which it was enacted* Based on nine observations 
conducted for the program, three at each site, it appeared the program was 
implemented as intended. Observational records indicated that the vast 
majority of students participated in classroom activities and were on-task 
during class. In fact, all students observed during two classroom time 
intervals were judged as being on-task by observers, although it was noted 
that students were not always actively participating in classroom activities. 

As intended, instructional activities were diverse and comprehensive. 
For example, during the 456 minutes of actual class time observed (approxi- 
mately 51 minutes per observation), 36X of the instructional activities were 
devojted to independent or small group completion of assignments, 32X for 
listening to lectures or demonstrations, 23X for classroom discussion and 
recitation in response to teacher questions, and 9X for other group-related 
activities such as correcting answers to test questions. In addition, 
completing assignments Independently or in small groups was the most frequent 
instructional activity as six of the nine observations noted. 

The Instructional activities observed consistently agreed with the 
outline and goals of the course. English classes primarily included review 
of vocabulary words and their root meanings and test practice. A variety of 
supplemental materials was also used to satisfy this objective including a 
word study guide. How to Ace the SAT, and The College Pr^^o Game , 

In the math section, lecture, classroom discussion, and practice test- 
taking were primarily employed. Teachers commonly explained the rationale 
behind correctly answering geometry and algebra items. In addition to the 
primary texts, other materials included a test preparation workbook. Mastering 
the SAT and past PSAT math exams. For example, the math instructor at Lane 
employed test practice and drill techniques from old PSATs to familiarize 
students with the PSAT. 

Guidance component activities appeared to fall into two categories, (1) 
reviewing test-taking strategies and (2) exploring values, college, and career 
choices. While reviewing test-taking strategies was most often handled 
through lecture and class discussion, the latter activities were instituted as 
group and class discussion activities. For instance, during one class 
students filled out and discussed a career interest inventory. Following this 
activity, a job-simulation exercise was completed where student pairs reacted 
to possible job situations. Supplemental materials guided these group activi- 
ties. Student progress was assessed in a variety of formal and informal 
methods Including verbal feedback, monitoring, discussion, and practice test 
results. 

Classroom observers also assessed the learning climate of the program. 
Results showed that on a four-point scale, student behavior was adequately 
controlled in the classroom (3.8), students were task-oriented throughout 



ERIC 



-9- 



18 



class (3.4), students participated in class (3.6), and teachers facilitated 
classroom learning (3.6). Classroom observers also noted that the diversity 
of the instructional format stimulated student involvement (3.7), and the 
primary instructional materials were not consistently used during class (2.3). 
Thi» latter finding suggests the lack of a uniform instructional focus. 

Informal teacher interviews were completed at the end of each observation 
and indicated that level of interest in the program was influenced by the 
subgroup in which the students were placed. Teachers reported that motivation 
and Interest level declined for the lower ability subgroups. Some teachers 
reported that the implementation of the program was negatively influenced by 
the lack of regular student attendance and the lack of math preparation for 
many students which changed the program from a review/test-taking focus to an 
instructional one. 

Thus, TPPGTS appeared to be implemented as designed, as nearly all 
instructional activities and materials were used in accordance with the 
outline of the course. However, it should be noted that observations were 
few and conducted early in the program. To gage a comprehensive imple- 
mentation of the program, additi ;!il observations should be conducted toward 
the end of the program. In addi.ion, as has been noted in previous reports, 
many of the guidance activities wt^ independent of tesc preparation such as 
values clarification, college, and career information activities. In fact, as 
the guidance teachers indicated, less than one-half of all class time was 
devoted to test preparation activities. Given that the design of this program 
is obligated to test preparation, these guidance activities are clearly 
incompatible with the program and may be unexpected by participating students. 
To rectify this state of affairs, at least the total content of the program 
should be reflected in its title, and at best, the guidance component should 
be purged of all such "nontest-taking" activities and incorporated with the 
other components. 

Test Results 

In order to assess the central question of the evaluation, that of 
whether summer program students made greater gains from pre- to posttest than 
the comparison group, a variety of statistical analyses was employed. To 
determine general within-group program effects, a paired-sample correlated t- 
test was used. This test is regarded as the most powerful for a paired, 
correlated sample (Hays, 1981). Second, to determine general PSAT program 
effects ^etween summer program and comparison group students, Hotelling's 
(1931) T statistic was computed across sites for verbal and math subtests. 
This multivariate statistic is also widely regarded as the most powerful test 
for a p-variate, simultaneous comparison (Marascuilo & Levin, 1983). Third, 
to determine substantive program effects, univariate analysis of covariance 
was used for the verbal and math subtest. Although multivariate analysis of 
covariance is generally more efficient, the verbal and mathematics subtests 
are considered separate and independent tests and will be analyzed along this 
line. 

As shown in Table 3, paired t-test results indicated that summer program 
students made significant pre- to posttest gains on the verbal (t>2.52, p<.05) 
and math (t«7.82, p<.001) subtests. Their math subtest gain score (in points) 
of 5.2 (48.8 to 54.0) was over 3 times greater than their verbal subtest gain 



ERIC 



-10- 

19 



Table 3 



Mean Practice PSAT Pre- and Posttest Scores by Group and Site 



YERBAL Hm SELECTION index ^ 



Site/ 
Occasion 


Summer 
Program. 
fSt. Dev^ 


i Comparison! 
USt. Dev) 1 


Summer 1 
Program I 
(St. Dev) 1 


Comparison 
(St. Dev) 


Sunnier Pro. 
(St. Dev) 


[Comparison 
1 (St. Dev) 


LANE 

(N>53/35) 
Pretest 

Posttest 


42.0 

43.3 
1 (8.5) 


1 43.9 1 
f6 1) 1 
1 46.5 1 
J (7.7) 1 


48.7 1 

56.1 1 
(8.2) 1 


50.4 

.0) 

50.7 1 
(7.2) 1 


132.6 
(22.2) 
142.6 
1 (21.9) 1 


138.3 
(17.2) 
143.6) 
1 (19.1) 


Gain 1 


+1.3 


1 +2.6** 1 


+7.4***1 


+0.3 


+10.0*** 


+5.3** 


LINDBLON 1 

(n.l9/-) 1 














Pretest 1 
Posttest 1 


43.7 

\l - v f 

46.2 

(5.5) 


1 - 1 


49.0 1 

(ft 7\ 

52.3 1 
(5.1) 1 


. 1 


136.5 

/ 1 R A\ 

(lb. 4) 
144.6 
(14.7) 1 




Gain 1 


+2.5 




+3.3* 1 




+8.1** 




CURIE 1 
(n«21/12)| 














Pretest 1 
Posttest 1 


44 0 

(7.5) 

45.1 

(5.7) 


t 45 4 1 
1 (8.1) 1 
1 45.3 1 
J (8.6) 1 


HO . 3 1 

(7.2) 1 
50.3 1 
(5.5) 1 


(7.8) 1 
52.8 1 
(7.1) 1 


1 OC Q 1 

140.4 1 
(15.2) 1 


(20 3) 
143.3 
UiJU- 


Gain 1 


+1.1 


1 -0.1 1 


+1.4 1 


+1.6 1 


+3.5 1 


+1.3 


TOTAL 1 

(n.93/47)| 
Pretest I 

Posttest 1 


42.8 

(8.2) 

44.3 

(7.6) 


1 44.3 1 
1 (6.6) 1 
1 46.2 1 
1 r7.8) 1 


48.8 1 
(8.1) 1 
54.0 1 
f7.6) 1 


50.6 i 
(7.6) 1 
51.2 1 
(7.2) 1 


134.4 1 
(19.5) 1 

142.5 1 
(19.2) 1 


139.3 
(17.9) 
143.6 
(18.9) 


Gain 1 


+1.5* 


1 +1.9* 1 


+5.2***1 


+0.6 1 


+8.1*** 1 


4.3* 



2 Scores used are scaled scores and range from 20 to 80. 
Selection Index (2*verba1 4 math) 

* p<.05 
** p<.01 
*** p<.001 



ERIC 



■ii- 



20 



score of 1.5 (42.8 to 44.3). Although the comparison group, suprisingly, had 
a higher pre- to posttest verbal gain then the summer program group (1.9 
points) (t«2.30, p<.05), their math gain score of 0.6 was not statistically 
significant. Thus, while both groups gained similarly on the verbal test, the 
summer program group made substantial gains on the math subtest over and above 
the comparison group. Mean total selection Index (2*Verbal ♦ Math) gain 
scores were also considerably higher for the summer program group (8.1 to 4.3 
points). 

Table 3 also indicates that pretest comparison group scores are higher 
than the summer program group for verbal, math, and selection Index scores. 
However, independent t-test results indicated these average pretest scores 
were not statistically higher than summer program students. This further 
confirms the relative equivalence of groups before the onset of the program. 

Table 3 also provides a PSAT score breakdown by program site. Results 
indicate that Lindblom program students gained 2.5 points on the verbal 
subtest while program students elsewhere gained approximately one point. 
Comparison group students gained very little on the verbal subtest with the 
exception of those tested at Lane, who scored 1.3 points higher than the 
summer program group. They were completely responsible for the gain of the 
entire group. At Curie, comparison group students had a slightly negative 
gain score from pre- to posttest test while the summer program group gained 
ebout 1 point. There was no comparison group at Lindblom. 

In regard to the math subtest. Lane program students were primarily 
responsible for the total group gain as they improved 7.4 points from pre- 
to posttest. Lindblom and Curie students gained 3.3 and 1.4 points, 
respectively. Math gains for comparison group students were minimal, but 
suprisingly, comparison group students at Curie had a higher gain score than 
their summer program counterparts (L6 to 1.4 points). Selection Index score 
gains were also highest for Lane program students as they gained 10 points 
from pre- to posttest, twice the gain of their comparison group counterparts. 
Lindblom students, who did not have comparison group counterparts, gained 3.1 
points on their selection index scores, similar to the toral group average. 
Curie program students had, in addition to the lowest verbal and math gain 
scores, the lowest selection index gain scores (4.r> points). However, this 
gain was over three times higher than their comparison group counterparts. It 
should be noted that individual site results shoif'ld be regarded with caution, 
especially at Lindblom and Curie, because of relatively small sample sizes. 
Despite this caution, it appears the program had a differential Impact across 
sites as both teachers and content were different. These differences should 
be kept in mind when interpreting program effects. 

eivi*i the statistically significant gain scores of surmer program 
students, between group analyses were conducted to determine substantive 
program effects. Results indicated a significant overall multivariate 
difference on verbal and math scores between summer program and comparison 
group students (T ■15.70, p<.001). However, Lhere were no significant 
differences between groups on the PSAT verbal subtest after accounting for 
verbal and math pretest scores. In fact, the comparison group had a larger 
gain score than the summer program group (1.9 to 1.5 points). However, 
significant differences between groups on the PSAT math subtest remained even 
after controlling for other background variables such as TAP reading, math. 



-12- 

ERjC 21 



and total scores, number of math and English courses taken in high school, and 
overall high school math and English grades. Given that comparison group 
students scored generally higher on the TAP, differences between groups became 
even larger after analyses of covariance. Group differ^ences on PSAT verbal 
posttest scores retrained nonsignificant after all analyses of covariance. 

As a result of the above analyses, an estimation of program effects can 
be made on both the verbal and math PSAT subtests. This estimation, via 
analysis of .©variance with multiple covarlates, adjusts for pre-existing 
group differences on all included variables and generally increases the 
precision of estimates of program effects (Cook & Campbell, 1979). Indepen- 
dent subtest results indicated a positive treatment effect of 4.2 PSAT points 
on the math subtest after adjusting for group differences on the math practice 
pretest, 1985 TAP math scores, and 1985 TAP total scores. In other words, if 
the two groups started with the same math PSAT subtest, TAP math, and TAP 
total scores; the sumtier program group would have significantly out performed 
the comparison group by 4.2 points on the PSAT math posttest. Unfortunately, 
this positive treatment effect cannot be generalized to the verbal subtest. 
In fact, results indicate a negative treatment effect on the PSAT verbal 
subtest. That is, after accounting for initial group differences on the math 
pretest, TAP math subtest, and TAP total test, the comparison group out 
performed the surmer program group by approximately 1 PSAT point. This 
difference was not significant. Thus, the verbal PSAT coaching had no 
apparent effect in raising participating students' test scores. 

It should be noted that many other variables were included in the 
analysis of covariance model in order to adjust for pre-existing selection 
difference in the groups including number of math and English courses taken; 
grades received; specific subscales on the TAP; and combination of TAP, math, 
and verbal pretest scores. These added variables did not improve the 
precision of treatment effect estimates or the prediction of verbal and math 
posttest scores. 

PSAT Scores bv <?tAn4 ne Group 

Table 4 provides a breakdown of PSAT scores by reading ard math stanine 
groups. Employing paired t-tests, there were significant pre- to posttest 
gain score differences for nearly all TAP reading and math stanine groups. 
As is also shown, there are clear divisions in test scores between stanine 
groups. Each stanine group received higher PSAT subtest scores than the 
preceding group. Referring to the TAP reading stanine breakdown, ninth 
stanine students made the only significant pre-posttest gain on the verbal 
subtest, although stanine seven students also had a nearly identical gain. 
In regard to pre- and posttest math scores, students in all reading stanines 
made highly significant gains but ninth stanine students obtained the highest 
math gain score (5.7 points). Ninth stanine students also obtained the 
highest average selection index score (10.1 points). 

A similar pattern emerged in comparing math PSAT scores by math TAP 
scores. Again, ninth stanine students made the only significant average 
verbal subtest gain score (3.7 points) while stanine seven and eight 
students' gain scores were less than one point. Regarding math subtest 
scores, math stanine seven students obtained the highest average gain score 
(6.7 points), 



-13- 



22 



Table 4 



Mean Practice PSAT Scores by TAP Reading and Math Stanlne Groups 

(N«93) 



I 
I 

ITest/Occaslon 
I 



I I 
TAP READING STANTNES 



7 



8 

(n"37) 



I I 

TAP MATHFMAT TCS STANTNES 



9 



I 
I 
I 

I 7 

l(n«37) 



8 

(n'34) 



9 



99 

(n=lQ)l 



I 

I Verbal Pre 
I Verbal Post 
iGftIn 

J 



38.2 
40.5 
2.3 



44.3 
44.6 
0.3 



46.3* 
48.5 
2.2* 



40.9 
41.8 
0.9 



43.7 
44.6 
0.9 



44.6 
48.3 
3.7** 



49.2 
51.7 
2.5 



T 

iNath Pre 
I Math Post 
IGaIn 

J 



48.6 
52.8 
3.2*** 



46.8 
52.4 
5 . 6*** 



1 

52.0 I 43.1 
57.7 I 49.8 
5.7***1 6.7*** 
J 



50.1 
53.4 
3.3** 



56.4 
61.9 
5 . 5*** 



59.3 
64.1 
4.8* 



"•"l 

isr Pre 

iSI Post 

IGaIn 



125.0 
133.8 
8.8** 



135.4 
141.6 
6.2** 



1 

144.6 1124.8 

154.7 1133.3 
10.1***1 a.s*** 



137.5 
142.6 
5.1*. 



145.6 
158.5 
12-9*** 



157.7 
167.5 
9.8** * 



2 This group scored In the ninth stanlne In reading and math 
Selection Index (2*verba1 + math) 



* p<.05 
** p<.01 
*** p<.001 



slightly higher than ninth stanlne students (5.5 points), and twice as high 
as eighth stanlne students (3.3 points). However, the large actual score 
differences between stanlne groups should not be forgotten as math subtest 
scores Increased by six to seven points for each stanlne level. Similarly, 
math stanlne nine students also received the highest selection Index average 
gain score (12.9 points), although all stanlne groups made significant pre- 
posttest gains. Eighth stanlne students obtained the lowest selection Index 
score gain (5.1 points). Thus, the data generally Indicate fairly high gain 
scores for seventh stanlne students but then dropping somewhat for eighth 
stanlne students and then rising even higher for ninth stanlne students. 

Of additional Interest are PSAT results of students In the ninth reading 
and math stanlnes. As shown, they received the highest subtest scores of any 
other group but not the highest gain scores. Although their verbal gain 
scores were nigher than n<}arly all other groups (2.5 points), it was not 
statistically significant. Math (4.8 points) and selection index (9.8 
points) gain scores were statistically significant although they were not 
quite as high as those of TAP reading or math ninth stanlne students. It 
should be noted that the relatively small sample size of this group limits 
the abovi» results. 




-14- 

23 



Thus, ninth stanine TAP reading and math students appear to take the most 
advantage of the program as their gain scores are generally higher than other 
groups. Given the goal of producing more National Merit scholars, ninth 
stanine math and reading students are the most likely to satisfy such a goal, 
witness that their selection index scoras are at least 9 points higher than 
any other stanine group. 

Analvsis within the Summer Pro gram Group 

Also of interest In this investigation were differences in PSAT perfor- 
mance by various student subgroups including gender, PSAT experience, and 
reasons for enrolling in the course. In regard to gender differences, male 
and female program students scored similarly on pre- and posttests as there 
were no statistically significant differences on verbal or math subtests. 
Male and female students had identical verbal pretest scores (42.8) and 
similar verbal posttest scores (43.7 and 44,8), respectively, while male 
students had slightly higher math pretest (50.0 to 47.4) and posttest scores 
(54.0 to 53.6). With the exception of the written expression TAP subtest, 
whereby female program students scored significantly higher thai, their male 
counterparts, groups were similar on all background variables such as year in 
school and TAP reading and math subtests. 

rSAT experience appeared to have a greater effect on scores than gender. 
Program students who had taKen the PSAT before, pretested significantly higher 
on the verbal (t-2.60, p<.01) and math (t-4.19, p<.0001) subtests of the PSAT 
than students who had not taken the test previously. This rap was narrowed by 
the posttest as there were no significant differences between groups on the 
verbal subtest and smaller differences on the matft subtest (t«2.19, p<.03). 
However, students who had previous PSAT experience were a higher achieving and 
older group than novice test-takers as they scored significantly higher than 
other students on the TAP math subtest, the total TAP, and the TAP basic 
subtest. They also were significantly older and had taken more math courses 
than their PSAT counterparts. 

It was also of interest to compare test scores by the nature of partici- 
pation in the program. For exansple, it was hypothesized that students who 
enrolled in the program specifically to improve their PSAT scores would have 
greater gains than other students, such as those who entered the program only 
because their parents expected them to or thought it would be a good way to 
meet other students. Results indicated no significant differences in verbal 
or roath subtest scores between students who enrolled only because their 
parents expected them to and those that enrolled for other reasons. There 
were also no differences between students who enrolled because they wanted to 
do better on tests and those who enrolled because they didn't have anything 
else planned for the sunmer or thought it would be a good way to meet people. 

Supplemental Peports of Program gffects 

To supplement pre- posttest data, student and teacher questionnaires were 
completed during the last week of the program. 

Student questionnaire- Results of the lOl student questionnaires, not 
necessarily including those in the matched program group, indicated that they 
rated the program quite positively on a scale from one to four, with four 

-15- 



Er|c 24 



being the most positive. For example, of the 22 items assessin^i general 
program content, students thought the English (3.4), math (3.4), and guidance 
components were helpful in preparing for the PSAT. Their vocabulary (3.4) and 
problem-solving (3.2) skills were improved, and as a result of the program 
they were better prepared to take the October 1986 PSAT (3.4). 

Not so positively assessed features of the program were the reltvance of 
the guidance component (2.8), newness of the materials (2.5), and meeting 
everyday (2.7). Although students were divided, they indicated that taking 
the practice tests wa.« somewhat more effective than class instruction in 
preparing for the PSAT (2.7). 

A second dimension assessed by the questionnaire was the effectiveness 
of the instruction materials. Students, on the average, indicated the 
instruction materials were effective (3.5) and rated the primary instruction 
materials in the following order: Barron's How to Prepare for the PSAT/WMSnT 
(3;4). Barron's Strategies fpr thg ,W (3.1), and Inside Strat«>oies fnr the 
SBl (3.0). It should be noted that individual copies of these materials were 
not provided so they are assessed only in terms of their classroom use. 

A third function of the questionnaire was to ascertain the kinds of test- 
taking strategies learned in the program. Students responded to a series of 
12 testwiseness items and rated them on a scale from one to four. Students 
agreed that many concepts were learned in the program including short-cuts to 
doing PSAT problems (3.4), using time wisely (3.5), the importance of knowing 
algebraic and geometric opeiations, and understanding how the PSAT is designed 
(3.4). More specifically, students stated their level of agreement with a 
number of principles taught in Inside Strat egies for »he <?at Eighty-three 
(83) percent correctly disagreed or strongly disagreed "that the only way to 
do well on the vocabulary test is to memorize as many words as possible." 
Only 25X of the students strongly disagreed. In reference to the question 
regarding reading directions to the PSAT if already known, 70* of the students 
correctly disagreed or strongly disagreed that directions should be read in 
this Instance. More convincingly, 96X of the students agreed or strongly 
agreed that test choices should be tried in reverse order when the answer to 
an item is not known. In response to a item about guessing, 70X of program 
students agreed or strongly agreed that they should usually guess when th«y 
don't know the answer to a problem. However, too many students disagreed 
(30X), suggesting they would not usually guess in this instance. 

Unfortunately, all item responses did not indicate students learned what 
they read or were taught. For example, 60X of tlie students agreed tnat they 
should first read the questions following the passage before reading it, 
although Sruber recommended in Inside stra tegies for the SAT that the passage 
should be read first. 

Questionnaire responses were flso compared between Lar.e site students and 
those at the other sites in order to help explain gain score differences 
between sites. The only item found to be significantly higher in favor of 
Lane students regarded guessing. Lane students learned to a greater degree 
than others that they should usually guess when they come to a item for which 
they didn't know the answer (t-4.67, p<.0001). Thus, the higher math gain 
scores of Lane students was not entirely explained through questionnaire 
responses. 



-16- 



25 



Thus, for the most part, students learned important test-taking 
principles that will help them on the PSAT, although in some cases, such as 
reading directions and guessing, a substantial number of students misconstrued 
proper testwiseness principles. 

Teacher quest ionnaira. Teacher survey responses reinforced the diversity 
of instruction materials and strategies used in the program. In addition to 
using the assigned materials, teachers indicated they supplemented instruction 
with test preparation materials, college guidebooks, and mathematics and 
English textbooks. These materials were employed for a variety of instruction 
activities including lecture, discussion, demonstration, independent work, and 
innovative activities (i.e., games and group exercises). The frequency of 
these activities (measured on a scale from one to four with four indicating 
daily use) varied considerably ds discussion (3.5), independent seatwork 
(3.0), and oral recitation (2.8) wer* used more frequently than demonstrations 
(2.5) and lectures and presentations (1.2). Corresponding abilities taught 
with these materials and activities were primarily test-taking skills, 
listening, reading, study skills, stress management, and problem solving. 

However, the use and frequency of such materials and activities varied by 
component. Not surprisingly, English class activities emphasized discussion, 
oral recitation, and seatwork with the objective being vocabulary and reading 
development. Supplemental vocabulary handoutr and practice exercises were 
used toward this end. The mathematics component centered on test practice and 
drill toward the goal of improving problem-solving, computation, and test- 
taking skills. Sjpplemental materials such as math workbooks were also used. 
For example, at Lane the math instructor administered past PSATs every week 
and provided feedback after each test as the primary instruction focus. On 
the other hand, guidance component activities stressed test-taking study 
skills and other skills not specifically related to test preparation such as 
stress management, image building, and college and career planning. Instru- 
ction activities most frequently used were discussion, independent seatwork, 
and demonstration. A plethora of materials was employed in addition to test- 
taking materials and included counseling and college planning guides and 
career development manuals. Group exercises were also employed toward the 
goal of personal development (i.e., group dynamics, leadership skills, and 
values clarification). 

Teachers also assessed the effectiveness of the primary Instruction 
materials and reported on a scale from one to four that Barron's test 
preparation book was most effective (3.4), while the other primary source book 
Inside Strategies for the SAT, was rated lower (2.7). However, teachers noted 
that the latter would have been more effective if copies were available to 
students (i.e., "would have been better if every student had a copy"). 
B<»rron's Strategies for Taking Ti>sts and Mathematics for the College Boards 
were also rated positively, although some teachers did not rate their 
effectiveness. 

Mastery of program objectives was assessed in a variety of ways but most 
frequently by in-class homework assignments, participation in class 
discussion, teacher-made tests, and oral recitation. For example, the English 
class at Curie had daily vocabulary quizzes to familiarize students with word 
meanings. 



-17- 




In addition to document Ir.;, the techniques and materials used In the 
program, teachers noted strengths and weaknesses of the program imd made 
recomnendatlons toward Its Improvement. 

In regard to limitations of program Implementation, teachers reported 
that regularity of student a*-tendance (eight teachers su Indicated), 
coordination between components and with the central office, and student 
Interest In class presented problems In the program. They also voiced concer.. 
over planning aspects of the program Including criteria for student selection, 
recruitment procedures, ability of students recruited, and the quali'v of the 
teacher inservice. In regard to the criteria for selection, one teu-her noted 
the Importance of the "selection of students with high S.P.A.'s as well as 
high test scores... It takes a combination to be successful in entering 
college." 

In contrast, strengths of the program noted by teachers Included its 
positive emphasis on test-taking strategies, the small class size, an'J the 
general concept of test preparation. For example, one te«cher Indicated "The 
concept is very good. The end results were very pr«r,ising" while another 
stated the "Small classes added Intimacy and allowed for personal 
Interaction." 

Recorrsandations made about the program were numerous and included improv- 
ing the criteria of selection by using grades and teacher recommendations, 
shortening the program to an after-school program, recruiting students much 
earlier in the spring, providing more teacher in-service sessions. Improving 
the coordination with the central office, and giving students academic credit 
for participating. Most of these recommendations have been made in earlier 
reports and must be resolved before another program is implemented. 

October IQSS Results 

To deteriflir.s the accuracy and stability of PSAT practice tests, results 
of the October 1986 PSAT were solicited for summer program and comparison 
grou^ students. These results as well as matched pre- and posttest results 
are listed in Table 6. As can be seen, summer program students had a mean 
verbal sca>d score of 46.5 compared to 48.4 for comparison group students. 
These scores indicate posttest-to-October 1986 test score gains of 2.1 and 1.8 
points, respectively, slightly higher than pre-posttest gain scores. Sun-ner 
program (54.1) and comparison group (53.1) mean math scaled scores represented 
0.8 and 1.6 point gains from the posttest. This general upward movement of 
test scores is hot consistent with 1985 findings and suggests, especially for 
the verbal subtest, that regular fall academic school work helped improve test 
scores more than the program or that motivation to do well was more prevalent 
on the October test. The small sample sizes obtained should not be forgotten. 



-18- 



Table 6 

Mean PSAT Scores of Students taking the ')ctober 1986 Test by Group 



Test/ 
GrouD 


1 1 
1 Jun& 1986 1 
1 (Pretest) 1 


July 1986 
(Posttest) 


1 October 1986 
1 (Actual) 


i July to 
1 October 
1 Gain 


Sumner 


1 

1 Verbal 


43.0 1 


44.4 


46.5 


1 2.1 


Program 


1 


(7.9) 1 


(6.9) 


1 (8.2) 




(n-72) 


1 Hath 


49.3 1 


53.3 


1 54.1 


1 0.8 




1 
1 


(7.6) 1 


(7.1) 


(7.7) 




1 

ComparlsonI Verbal 
(n-37) 1 

1 Hath 

1 

1 


44.9 1 
(6.1) 1 
51.3 1 
(7.3) 1 


4o.6 
(6.8) 
51.5 
(7.2) 


48.4 

1 (6.8) 
1 53.1 
(6.4) 


1 1.8 
1 1.6 



Discussion 

Thi major evaluation question addressed was whether summer program 
students outperformed comparison group students from pre- to posttest and If 
this performance .can be attributed to the program. Results Indicated a 
significant positive program effect for the PSAT math subtest after accounting 
for PSAT math pretest scores and TAP math and total scores. This was 
especially apparent at Lane where program students gained 7.1 PSAT points over 
a comparison group. Relatively small sample sizes precluded Interpretation of 
program effects at the other sites. 

In regard to the PSAT verbal subtest, there were no overall significant 
differences between summer program and comparison groups on the PSAT verbal 
posttest after accounting for differences on the verbal pretest, TAP reading 
and total test scores, high school English grades, number of high school 
English courses taken, and P^AT math pretest scores. In fact, results showed 
an overall negative program effect for the verbal subtest as comparison group 
students obtained higher gain scores from pre- to posttest than summer program 
students, especially at the Lane program site. The estimated overall program 
effect from analysis of covar lance was -.7 PSAT verbal score points or a loss 
by the summer program group relative to the comparison ^roup or approximately 
7 points on the SAT scale. Although this loss is statistically nonsignifi- 
cant, it does indicate the effectiveness of the verbal coaching component of 
the program was poor. 

The inconsistent results obtained across sites suggest that the program 
was Implemented differently at each site. For example, at the Lane site the 
Instructor emfiihasized systematic oract ice-testing and feedback with past PSATs 
that was not apparent at the other sites. Differences in teacher experience 
and knowledge of test preparation may have also played a role. The small 
sample sizes at the non-Lane sites may have further exacerbated observed 
results. 

-19- 



mc 



28 



While these different results should be kept In m1nd» It should not 
dissuade us from Interpreting verbal and math coaching effects. The primary 
question to be addressed here Is the validity of attributing the positive and 
negative program effects to the test preparation program. 

Establishing the Validity of Obser ved Program Fffucts 

The nonequlvalent control group design with pretest and posttest measures 
used In this study Is a generally Interpretable design for establishing the 
Internal validity of observed results. Results are Internally valid when 
program effects can be attributed directly to the program and alternetlve 
explanations of program effects are ruled out. The present design typically 
rules out many threats to Interval and external validity that may bias 
estimates of program effects. Following Cook & Campbell's (1979) method of 
determining the Internal and external validity of observed program effects, 
the primary validity threats are tested Individually for plausibility so that 
the nature of test score gains c*' be found. Elimination of all or most 
threats to Internal validity le support to the attribution of test score 
gains to program effects while ..imlnatlon of external validity threats would 
support the representativeness of present findings to other student 
populations and settings. The primary concern here will be the observed PSAT 
math coaching effects, especially at Lane, since the positive evaluation 
results were observed here. However, Interpretation of no program effects for 
the PSAT verbal coaching component will be discussed separately. The 
discussion will begin with Internal validity threats. 

Threats to Internal Validity Ruled Out 

Based on the design and results of the study, the following threats may 
be ruled out: 

Histgry, This threat occurs when criterion scores of an experimental 
group are Influenced by forces outside the context of the program such as 
other people or other Instruction. The present cont. group design generally 
accounts for this threat as It Is assumed outside forces are Influencing both 
treatment and comparison groups equally. This potential effect would then be 
controlled In the pre- posttest data. History Is a special concern In 
education because students are continuously exposed to an amalgamation of 
Instruction programs, all of which may Influence each other. In the present 
study, this threat Is further neutralized by the relative short duration and 
Isolated conditions (sunwer school) of the program. History effects usually 
occur most frequently with longer running programs. 

Maturatlgn. This threat Is of concern when an observed treatment effect 
may be due to the general maturing process (I.e., growing older, wiser, or 
more experienced) rather than the program. As with history, maturation 
effects usually take place over a long period of time and would rarely occur 
over a six-week period. Further, the present design generally controls for 
maturation effects^ and It Is unlikely differential effects were operating 
between groups. 

Testing, Testing effects may occur when observed program effects are the 
result of participants becoming more familiar with a criterion test such as 
when the same test Is used for all testing occasions. Again, the use of a 



-20- 

29 



pre-posttest control group design usually accounts for this potential rival 
hypothesis, because both groups would take advantage of such a situation. In 
the present study, this threat Is further mediated by the use of alternative 
test forms from pre-^ to posttest. 

Instrumentation , This threat typically occurs when the measuring Instru- 
ment Is changed In some way between the pre- and posttest or when groups 
exhibit "floor" or "celling" effects. Both effects appear to be controlled In 
this study. The alternate-form PSATs used produce Identical linear standard 
scores that are statistically equated. "Floor" or "celling" effects were not 
an Issue because mean test scores hovered around the middle range of the PSAT 
sca1e« 

Statistical regression . This threat generally refers to the upward move- 
ment of a pretested experimental group to Its population mean at the posttest 
which Is mistaken for a treatment effect rather than a statistical artifact. 
The threat Is most ominous when the treatment group has a much lower pretest 
score than the comparison group and criterion measures are unreliable. The 
latter condition Is not plausible since the PSAT Is a nationally standardized 
test that has high test-retest reliability. The former concern Is also 
minimized because (1) the summer program and comparison groups did not perform 
signf leant 1y differently on the math pretest to render regression a major 
threat and (2) the pattern of results obtained on the math test In which the 
lower-scoring summer program group overtook the comparison group In a cross- 
over fashion by the posttest. As explained by Cook & Campbell (1979), this 
outcome reduces the likelihood of a regression alternative explanation because 
It Is not reasonable to expect the summer program group to regress above the 
posttest score of the comparison group. 

Mortality . Mortality Is a threat to observed program effects when a 
substantial number of students drop out of the program after the pretest and 
It Is found that those students who drop out have different characteristics 
than students who stayed In the program. When attrition Is high, sample 
representativeness Is compromised and estimated treatment effects become 
biased. Although evaluation data were reduced to Include only those students 
who completed pre- and posttests, consequently eliminating this validity 
threat, substantial reduction of the program sample Is problematic. Sixty- 
four (64) percent of those students pretested participated to some degree In 
the program and took the posttest, a fairly positive retention rate. Of the 
comparison group, only 38% completed both pre- and posttests. Indicating the 
Interest and motivation of this reduced group In Improving their test scores. 
However, It was found that those students who dropped out of the program after 
the pretest received -significantly lower verbal and math subtest scores than 
the final summer program group. 

Other major threats to internal validity ruled out . Other plausible 
alternative hypotheses to observed PSAT math coaching effects ruled out In 
this study include (1) diffusion of treatment, (2) compensatory equalization 
of treatments, (3) compensatory rivalry, and (4) resentful demoralization. 
These threats are used usually to explain minimal or no program effects and 
Involve contamination between experimental and comparison groups. Thus, they 
are not directly applicable to math coaching results. The most likely threat 
In this study would be resentful demoralization, whereby the comparison group 
reacts negatively to Its no treatment status and deliberately lowers Its test 



-21- 

'R^ 30 



performance. This would have the effect cf Inflating the estimated positive 
program effect. HoweYer, given the exclusively voluntary nature of th'' 
comparison group, It Is unlikely these students resented their no treatment 
status. 

Threats Not Ruled Out as Explanations for Program Effects 

Sft1ect1on > Selection Is a threat and potential explanation for observed 
program effects when there are preexisting differences between experimental 
and comparison groups that cannot be measured or controlled. It Is especially 
a problem when recruitment procedures of groups are different and assignment 
of partklpants to experimental and comparison groups Is not controlled. 
Although the use of analysis of cover lance In the present study controlled the 
effects of some measured characteristics between groups such as practice 
pretests and achievement test scores. It cannot control for unmeasured 
characteristics that may be different between groups. These other unmeasured 
rharacterlstlcs may have Influenced the positive math coaching effects 
obtained. 

Interest In Improving PSAT scores and motivation to do well appear to be 
the primary threats as a result of selection. The fact that summer program 
students participated In an In-depth test preparation program suggests they 
were more Interested and motivated In Improving their scores than the 
comparison group. The magnitude of this effect Is unknown, but It surely must 
be considered In Interpreting the validity of the observed math coaching 
effect. Fortunately, though, this motivation/Interest effect Is minimized by 
two findings. First, comparison group students were obtained on the basis of 
their Interest In Improving their own PSAT scores. The choice to take two 
practice tests was completely voluntary and Indicated they were also 
Interested In Improving their test scores. However, many students did not 
have time to enroll In the program or did not receive a letter Inviting them 
to participate. It was not because they were not Interested. In addition, 
nearly all comparison group students Indicated they would be taking the 
October 1986 test. 

Second, the plausible explanation that summer program students were more 
Interested In Improving their test scores may be offset by the fact that the 
comparison group was a higher achieving group than the summer program group. 
They scored higher on all achievement test measures and pretests. This 
Information should also be taken Into account. 

Thus, the probability of the observed math coaching effect being the 
result of selection differences Is reduced by the explicit Interest of the 
comparison group In Improving Its PSAT scores and the possibility that It Is a 
higher achieving group. The magnitude of this effect remains uncertain, and 
If this threat exists at all. It Is most likely a modest one, but It cannot be 
ruled out. 

Interactions with selection . These threats occur when selection 
differences cannot be ruled out and resulting differences may be combining 
with other Internal validity threates such as maturation, testing, and 
history. These threats are difficult to estimate for non-equivalent groups 
because all selection differences cannot be measured or obtained. Selection- 
Instrumentation can be eliminated quickly as both groups scored at 
approximately equal Intervals on the PSAT, thus, results could be Inter r>reted 

-22- 



similarly. Selection-maturation effects are minimized by the relatively short 
duration of the program and the finding that wi thin-group variances decreased 
from the pretest to the posttest for both groups. This suggests that the 
selection-maturation threat was at best minimal. In the latter case it is 
assumed that If selection-maturation is operating, then differential growth 
between groups should be occurring within groups as well, and within-group 
variances do not Indicate this occurrence (Cook & Campbell, 1979). 

However, two other interaction threats cannot be dismissed so easily— 
selection-history and selection-testing. Selection-history, the most serious 
threat, occurs most typically In disseminated treatment programs where sites 
receive different kinds of instruction, and program effects are concentrated 
at one particular site. Such Is the case with the math coaching component, 
since the primary treatment effect was concentrated at Lane. One must ask if 
there was some specific event or local history that enabled students at this 
site to gain over 7 points. The other two sites, although they were composed 
of much smaller numbers of students showed gains of only 3.3 and 1.4 points, 
respectively. In addition, all three groups started with nearly Identical 
math pretest scores. Taking out math PSAT scores at the Lane site reduces the 
average gain score from 5.2 to 2.3 and the unadjusted program effect from 4.6 
to .7 points. 

Thus, the apparent site-specific math PSAT program effect indicates that 
specific content information at Lane may have been responsible for the major 
program effect and not the program in general. As previously noted, teacher 
characteristics may have also played a part in observed test score differences 
between sites. Although the teachers recruited for the program were 
experienced in their particular subject areas, the Lane site instructor may 
have been more adept in preparing students for the PSAT. 

Selcction^testir,^. An ironic feature about this evaluation is that the 
effectiveness of a test preparation program is determined by tests. Thus, the 
summer program students a'-e exposed to more test practice and past PSAT tests 
than their comparison group counterparts. This increased exposure to tests 
per se, rather than the instruction of the program may have been responsible 
for observed program effects. Although it can be argued that test practice 
and completion of former PSATs are an intimate part of the program 
instruction, deter'^inatlon of which component is the most effective in 
improving test scores is a highly relevant issue, and cannot be resolved in 
this study. Self-reported student questionnaire data indicated that 30% of 
those surveyed indicated that the tests were more important than the 
instruction in learning about the SAT. 

The Status Ihe Verbal Coach ing Component 

Results indicated a small but negative program effect on the verbal PSAT 
from pre- to posttest. Thus, the verbal component may have been a detriment 
to summer program students. However, the estimated loss was less than 1 PSAT 
point. This negative and unusual result is difficult to explain as Internal 
validity threats are not applicable to this result. One possible explanation 
Is selection. Enrollment in the program favored those students who had not 
made summer plans, thus it Is probable the comparison group is more 
academically involved than their program counterparts. Consistently higher 
pre- and posttest ard achievement test scores support the greater academic 
experience of this group. 

-23- 

% 32 



However, considering that over 25 hours was devoted to the verbal 
component resulting In an average gain of 1.5 points, this must be viewed as a 
failure of the program and Its administration, especially since the goal of 
the program Is to produce national merit scholars from primarily above average 
students. In all fairness. It should be noted that the verbal subtest Is less 
amenable to coaching effects than the math subtest. 

Threats to Pttftrnal Validity 

The purpose of a nonequlvalent control group design Is to eliminate most 
threats to Internal validity and establish a program effect. It Is not 
particularly well designed to produce externally valid results or results that 
would be similar across other students and settings. The three major threats 
to external validity described by Cook & Campbell (1979) are the (1) Inter- 
action of selection and treatment, (2) Interaction of setting and treatment, 
and (3) reactive arrangements. The latter two threats appear to be ruled out 
In this study. The Interaction of settlnp and treatment or treatment effects 
varying with the setting can be general ;> ruled out on the grounds that 
educational test preparation programs are Intended for a homogenous setting, 
the classroom, and, all other things equal, would probably vary minimally 
across such environments. Also ruled out Is reactive arrangemnts, the 
probability that participants In other educational settings will react 
differently to the program In ways that change the magnitude of program 
effects. This threat Is greatest In laboratory studies where results are 
obtained In contrived and artificial settings. This Is not the case with 
classroom research as testing and program Instruction are regular features of 
education. 

Unfortunately, the most relevant threat to be ruled out, selection- 
treatment, cannot be. This external validity threat limits the general 1z- 
ablllty of results when program/treatment students are not representative of 
students-at-large. Obviously, program studeiits are a selective student group 
In regard to their motivation to do well on the PSAT and as test-takers. 
Program students were In the upper-third of the TAP achievement test 
distribution; thus, results cannot be generalized to students testing lower 
than this restricted range. It should also be noted that coaching research 
has not considered restricted student groups In determining program effects. 

GeneraH7abnitv of criterion scores. While positive coaching effects 
were round on the math subtest, the nature of these effects and their 
meanlngfulness are uncertain. As Anastasi (1981) has Indicated, It Is not 
clear whether test score gains due to coaching also result In Improved 
criterion score performance. The major purpose of test preparation programs, 
especially on ETS tests. Is to learn the test's structure, how It Is designed, 
testwiseness strategies, and effective response patterns and not substantive 
content training. Thus, It Is uncertain whether a student who scored 52 on 
the PSAT after a test preparation program would have the same performance 
capacity as someone who scored 52 without test preparation. The relatively 
small sample size of the study should also be considered, especially at the 
Curie and Lindblom sites. Generallzablllty of such results should be made 
with caution. 



-24- 





Tha Stafcus of tha Pronram 



On the basis of the above discussion, It Is possible to Interpret the 
observed math coaching effect. Even though most threats to Internal validity 
may be ruled out as Influencing observed effects such as history, maturation, 
testing, mortality, instrumentation, and to a degree, selection, a general 
program effect did not exist. The primary positive effect of the program was 
observed at Lane where gain scores can be attributed to the systematic 
practice testing and analysis of past PSATs and possibly the teacher. Thus, 
the positive program effect is site-specific but substantial. The f^ct that 
students coached at Lane gained 7.1 PSAT or 71 SAT points over their 
comparison group strongly indicates the positive effect of practice testing, 
drill, and feedback. Small samples at the other two sites preclude 
Interpretation of program effects. 

These results indicate two major conclusions regarding this program: (1) 
the desirability of using former PSATs as instruction materials almost 
exclusively for practice and drill and (2) the need to establish clearer and 
more uniform guidelines for instruction. The use of past PSAT forms serves 
the relevant function of familiarizing students with actual PSAT items rather 
than approximated PSAT items, thus, giving students a greater sense of the 
type of items they can expect on the test. 

In regard to instruction uniformity, it is imperative that future test 
preparation programs use a standardized curriculum for instruction, especially 
when the program is disseminated to multiple sites. The primary advantage of 
a standardized curriculum is that it minimizes teacher and content differences 
across sites. It also functions as a guide or "road map" for teachers to 
systematically follow and adhere to when they are not very familiar with the 
material. Also important is the selection and training of teachers familiar 
with the nuances of ETS-developed tests. 

The status of the verbal coaching program is bleak. Results indicated a 
negative program effect rather than a positive one, although the magnitude is 
not statistically significant. While this result may have been affected by 
the nonequi valence of the groups, it is unlikely the effect was major. Thus, 
the content of the verbal coaching must be changed significantly. A more 
direct emphasis on test practice and drill with former PSATs is suggested. 
The current practice of using materials that have Ifttle relation to the PSAT 
(i.e., Barron's materials, English texts) should be eliminated in favor of 
consistent drill under actual testing conditions and analysis of responses. 
If the goal of coaching is to improve test scores only, then coaching should 
correspond as closely as possible to the content of the test. The public 
availability of past PSAT and SAT forms enables this compatibility to be high. 

Comparability with Other Studies 

The results of this study, although inconsistent across sites, support 
the positive effects cf PSAT math coaching. The overall estimated math 
coaching effect of 4.2 PSAT points (42 SAT points or 71 points for Lane 
students) is much greater than the 10 to 15 point gains reported in the SAT 
review literature (DerSimonian & Laird, 1983; Slack & Porter, 1980), although 
negligible results were found at the other comparison group site. The present 
results do not conform to the logarithmic model of Messick and Jungeblut 



-25- 

34 



(1983) as It was calculated that a math score gain of 40 points would require 
approximately 107 hours of Instruction, Approximately 30 hours of math 
Instruction was Included in the program, Indicating significantly greater 
time-effectiveness than other studies. This was especially apparent at Lane 
where sunvner program students gained 7.1 points over the comparison group. 

The site-specific math program effect, then, does not undermine the 
positive coaci^lng effects, although the nature of these effects Is uncertain. 
These results are consistent with the 1984 and 1985 findings In which program 
students gained 4.4 and 3.5 PSAT points, respectively, from pre- to posttest 
(Chicago Public Schools, 1986). In regard to the verbal component, the 
present finding (1.5 PSAT gain score) Is not consistent with the 1985 and 1984 
results as It was found that program students gained 3.3 and 7.1 PSAT points, 
respectively. These differences as well as the site-to-site differences of 
1986 may be explained. In part, by the nonstandardlzatlon of materials and 
teacher characteristics over the past three years. 

Summary 

TPP6TS was generally observed to be Implemented as designed and satis- 
factory to both teacher and student participants. Teachers indicated the 
positive emphasis on test-taking strategies and small class size was 
facllHatlve of an effective program, while students noted they were better 
prepared to take the October 1986 PSAT. However, teachers suggested numerous 
changes be made such as shortening the program and changing the criteria for 
student selection* 

Students also Indicated that the guidance component * ^s not compatible 
with test preparation and meeting everyday was cumbersome. Test results 
showed differentiated math and verbal subtest gain scores across sites, 
although verbal gain scores were negligible between comparison and summer 
program groups. A highly significant math program effect was found at Lane 
that completely accounted for the total group effect. It was found that the 
nature of the program at this site was different than the other two as 
systematic use of past PSATs was utilized. 



-26- 

Ic 35 



Recormendatlons 



Based on the above results, the following recommendations are made about 
TPPGTS: 

1. EmDhasl7e practice and drill with oast PSATs . The tremendous math 
gain scores of the practice-test dominated Instruction at Lane 
further supports the effectiveness of practice and review of past 
PSATs In Improving test scores. Future programs should use this 
strategy In both verbal and math components. Again, as summarized by 
Anastasi (1981), the closer the correspondence between the program 
and the test situation, the higher will be the gain score. However, 
this also results In limited improvement of criterion behavior, that 
Is doing well In course work and school In general. 

2. Train teachers In the most effective wavs t o coach the test . Because 
effective teachers are essential to the success of any test 
preparation program. It Is Imperative they are trained In the 
Intricacies of test coacli.ng. Knowledge of how the test Is 
developed, what skills are tested, and testwiseness principles are 
necessary for Instructors to understand and teach. More extensive 
training sessions about the PSAT or SAT should be provided for the 
teachers. 

3. Standardize the Instruction and materials. Divergent results across 
sites may be attributed to the flexible use of materials and Instruc- 
tion methods by teachers. Teachers were able to structure their 
curriculum as they saw fit. However, since the usefulness of various 
Instruction practices has been supported, those practices ould be 
of top priority for Implementation Into the classroom. Thus, by 
structuring the program along the lines of practice and drill with 
past PSATs, the uniformity of the program may be established and the 
diversity of Instruction minimized across sites. 

4. Improve the student selection criteria. As has been discussed In 
previous reports, the exclusive reliance on TAP scores as the 
selection criteria undermines the Identification of gifted students. 
Other criteria such as grades, teacher/counselor recommendations, and 
Interest are also Important for selecting students for the program. 

A consensus of teachers also Indicated a problem with the selection 
criteria. However, altering the selection criteria will require more 
coordination between the schools and the central office. This will 
necessitate an earlier student selection process. 

5. Refine the Identification and selection of gifted students to be 
served bv this program . Evaluation results indicate that ninth 
stanlne students In reading and math outperformed other students on 
the PSAT and came closest to meeting the selection Index cut off 
score. They scored, on the average, at least 9 points higher than 
any other stanlne group. If the goal of the program Is limited to 
Increasing the number of National Merit scholars, ninth stanlne 
students appear to have the best opportunity for achieving this 
objective. 



-27- 

36 



Change the program to a tyeneral SAT/ACT prepa ration f\rot}ram . As 
discussed In the suninary evaluation report (Chicago Public Schools, 
1986), the present goal of Increasing the number of National Merit 
scholars has not been satisfied as nearly all students who 
participate in the program are well below the PSAT selection index 
cut off score of approximately 200. A more realistic and influential 
program goal would be to focus on general score Improvement on the 
primary college entrance examinations, the SAT, or the American 
College Test (ACT). Coaching for these tests wou'd have a much 
greater Impact on college entrance and could also provide more 
scholarship money for students to pay for college. Giving the 
general student population an opportunity to participate could 
Increase the benefit of the program. 

Give students an Incentive for participati ng In the nroyram . One way 
to increase the participation of gifted students and/or students In 
general is to offer some incentive such as academic credit, a 
certificate or letter of completion, or some such reinforceiiient that 
will, at least partially, improve the motivation of students to 
enroll and stay in the program. Many teachers also recommended this 
change In lieu of the fact that they indicated student attendance and 
motivation were problems. 

Since coaching programs produce short-ter m oains and do nnt imppova 
CQOnltive skills necessary for rioina well I n golleqe. wmnhasis should 
be given to programs that develop long-term cog nitive slrills. The 
educational significance of Improving college entrance examination 
scores may be limited to the test score itself. The SAT and PSAT 
measure a very limited set of abilities, and if overemphasized, 
downplay essential competency skills needed for learning. Coaching 
concentrates on testwiseness strategies and idiosyncratic qualities 
of tests rather than the development of cognitive skills. Rather 
than provide short-term gains on a test of questionable predictive 
validity, instructional programs on cognitive skills and problem- 
solving will provide the most effective foundation for academic 
success. 



-28- 

37 



References 

Anastasi, A. (1981). Coaching, test sophistication and developed abilities. 
American Psychologist. ^ (10). 1086-1093. 

Chicago Public Schools (1986). A review of test preparation programs for 
Qifted and talented sophomores. 1983-198B. Department of Research and 
Evaluation. Chicago, IL: Author. 

College Entrance Zxamlnatlon Board (1985). A counselor's guide to hftloino 
students learn from the PSAT/NMSQT. Philadelphia. Pa: Author. 

Cook, T.D. and Campbell, D.T. (1979). Ouasi-etperi mentatlon; Design and 
analysis Issues for field settings. Boston: Houghton Mifflin Company. 

DerSimonian, R. and Laird, N. (1983). Evaluating the effect of coaching on 
SAT scores: A meta-analysis. Harvard Fdiicatlonal Review. R.-^ i-ik 

Dear, R.E. (1958). The effects of a program of Intensive coaching on SAT 
scores (RB58.5). Princeton: Educational Testing Service. 

Dyer, H.S. (1953). Does Coaching Help? College Board Review. IQ. 331-335. 

French, J.W. (1955). An answer to test coaching. College Roard Review, ^ 



Hays, W.L. (1981). Statistics . New York: Holt, Rinehart and Winston. 

Hotel ling, H. (1931). The generalization of student's ratio. Annals of 
Mathematical Statistics. ? . 360-378. 

Marascullo, L.A. and Levin, J.R. (1983). Multivariate Statistics in the 
Social Sciences. Monterey, CA: Brooks/Cole. 

Messick, S. and Jungeblut, A. (1981). Time and method In coaching for the 
SAT. Psychological Bulletin. SI (2), 191-216. 

Slack, W.V. and Porter, D. (1980). The scholastic aptitude test: A critical 
appraisal. Harvard Fdu cational Review . 154-175. 



-29- 

38 



